When I started using WEKA java library to classify some adds for my employer some 10 years ago I was not aware of “data scientist” term. I consider myself as a programer that had to learn some new skills in order to use WEKA library.
Time passed and although I was heavily involved in machine learning and data science projects plus have now learned and used R on a daily basis I have still called myself programmer cause term “data scientist” was not familiar even to my fellow programmers.
For the last few years , data science & machine learning & ai recieved lot of attention and finally it become easy to explain what I do for a living just by saying “I am data scientist”.
However, this expansion was (is) very fast and term become blurred. Often companies have difficulties understanding what they actually need so they just advertise it as a “data scientist job offer”. Sometimes they just want to advertise some boring job with fancy title but typically problem is with actual understanding what you can expect from ML/Data science projects these days.
Here’s how I would define different positions in a modern DS team:
- Data Scientist
Requires deep understanding of machine learning and statistics. Typically has MS or PhD degree. Experienced in data manipulation and modeling. Has deep understanding of machine learning techniques and algorithms…
Experienced with: Python, R, SQL, scikit-learn, pandas, tensorflow, keras, jupyter notebook…
- Data Analyst
Collects and preforms statistical analysis on data. Helps business owners to make better decisions. Turn data into actionable insights.
Experienced with: Python, R, SQL, Bi-tools like Tableau, Data visualization
- Data Engineer
Works in partnership with Data Scientist. Creates data pipelines. Thinks about architecture , scalability, pushing models to production…
Experienced with: Docker, Python , Apache Spark, DB, Airflow, GIT, AWS, Kafka..
- Machine Learning Engineer
From my experience companies expect that this is both Data Scientist and Data Engineer in one person. It can be tricky when you prefer focusing more on data science, less on engineering part. I believe this is justified only for early stages of project , for startups and small, young companies.